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Abstract 



The problem of state communication over a discrete memoryless channel with discrete memoryless 
state is studied when the state information is available strictly causally at the encoder. It is shown that 
block Markov encoding, in which the encoder communicates a description of the state sequence in 



the previous block by incorporating side information about the state sequence at the decoder, yields 
the minimum state estimation error. When the same channel is used to send additional independent 
information at the expense of a higher channel state estimation error, the optimal tradeoff between the 
rate of the independent information and the state estimation error is characterized via the capacity- 



c/3 ' distortion function. It is shown that any optimal tradeoff pair can be achieved via rate-splitting. These 

o ; 

coding theorems are then extended optimally to the case of causal channel state information at the encoder 

using the Shannon strategy. 
> 

O . I. Introduction 

The problem of information transmission over channels with state (also referred to as state-dependent 

o 

channels) is classical. One of the most interesting models is the scenario in which the channel state is 
available at the encoder either causally or noncausally. This framework has been extensively studied for 
independent and identically distributed (i.i.d.) states, starting from the pioneering work of Shannon [26 1, 
Kusnetov and Tsybakov fl8ll . Gelfand and Pinsker |[T2l . and Heegard and El Gamal lfl4l : see a recent 
survey by Keshet, Steinberg, and Merhav ifTBl . 

Most of the existing literature has focused on determining the channel capacity or devising practical 
capacity-achieving coding techniques for this channel. In certain communication scenarios, however, the 
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encoder may instead wish to help reveal the channel state to the decoder. In this paper, we study this 
problem of state communication over a discrete memoryless channel (DMC) with discrete memoryless 
(DM) state, in which the encoder has either strictly causal or causal state information and wishes 
to help reveal it to the decoder with some fidelity criterion. This problem is motivated by a wide 
array of applications, including multimedia information hiding in Moulin and O'Sullivan 1221 . digital 
watermarking in Chen and Wornell [2|, data storage over memory with defects in Kusnetsov and Tsybakov 
iTiin and Heegard and El Gamal lfl4l . secret communication systems in Lee and Xiang [19], dynamic 
spectrum access systems in Mitola [21] and later in Devroye, Mitran, and Tarokh |[T0l , and underwater 
acoustic/sonar applications in Stojanovic ll27l . Each of these problems can be expressed as a problem 
of conveying the channel state to the decoder. For instance, the encoder may be able to monitor the 
interference level in the channel; it only attempts to carry out communication when the interference level 
is low and additionally assists the decoder in estimating the interference for better decoder performance. 
We show that block Markov encoding, in which the encoder communicates a description of the state 
sequence in the previous block by incorporating side information about the state sequence at the decoder, 
is optimal for communicating the state when the state information is strictly causally available at the 
encoder. For the causal case, this block Markov coding scheme coupled with incorporating the current 
channel state using the Shannon strategy turns out to be optimal. 

This same channel can also be used to send additional independent information. This is, however, 
accomplished at the expense of a higher channel state estimation error. We characterize the tradeoff 
between the amount of independent information that can be reliably transmitted and the accuracy at 
which the decoder can estimate the channel state via the capacity-distortion function, which is to be 
distinguished from the usual rate-distortion function in source coding. We show that any optimal tradeoff 
can be achieved via rate-splitting, whereby the encoder appropriately allocates its rate between information 
transmission and state communication. 

The problem of joint communication and state estimation was introduced in ||29l , which studied 
the capacity-distortion tradeoff for the Gaussian channel with additive Gaussian state when the state 
information is noncausally available at the encoder; see Sutivong ll28l for the general case. The other 
extreme case was studied later in 1311 . in which both the encoder and the decoder are assumed to 
be oblivious of the channel state; the capacity of the channel subject to a distortion constraint is 
determined. This paper connects these two sets of prior results by considering causal (i.e., temporally 
partial) information of the state at the encoder. 

Note that the problem of communicating the causally (or noncausally) available state and independent 
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information over a state-dependent channel was also studied in [17] and its dual problem of communi- 
cating independent information while masking the state was studied by Merhav and Shamai ll20l . Instead 
of reconstructing the state in some fidelity criterion, however, the focus in ifTTl was the optimal tradeoff 
between the information transmission rate and the state uncertainty reduction rate (the list decoding 
exponent of the state). We will later elucidate the connection between the results in IfTTl and our results. 

The rest of this paper is organized as follows. Section II describes the basic channel model with discrete 
alphabets, characterizes the minimum distortion in estimating the state, establishes its achievability and 
proves the converse part of the theorem. Section III extends the results to the information rate-distortion 
tradeoff setting, wherein we evaluate the capacity-distortion function with strictly causal state at the 
encoder. Since the intuition gained from the study of the strictly causal setup carries over when the 
encoder has causal knowledge of the state sequence, the causal case is treated only briefly in Section IV 
with key examples provided for the causal case. Finally, Section V concludes the paper. 

Throughout the paper, we closely follow the notation in [ 1 1 1. In particular, a random variable is denoted 
by an upper case letter (e.g., X, Y, Z) and its realization is denoted by a lower case letter (e.g., x, y, z). 
The shorthand notation X n is used to denote the tuple (or the column vector) of random variables 
(X\, . . . ,X n ), and x n is used to denote their realizations. The notation X n ~ p(x n ) means that p(x n ) 
is the probability mass function (pmf) of the random vector X n . Similarly, Y n \ {X n = x n } ~ p(y n \x n ) 
means that p(y n \x n ) is the conditional pmf of Y n given {X n = x n }. For X ~ p{x) and e G (0, 1), we 
define the set of e-typical n-sequences x n (or the typical set in short) If24l as 

% {n) (X) = {x n : \{i: Xi = x}\/n - p(x)\ < ep(x) for all x G X). 

We say that X — > Y — y Z form a Markov chain if p(x, y, z) = p(x)p(y\x)p(z\y), that is, X and Z are 
conditionally independent of each other given Y. Finally, C(x) = (1/2) log(l + x) denotes the Gaussian 
capacity function. 

II. Problem Setup and Main Result 

Consider a point-to-point communication system with state depicted in Fig. [TJ Suppose that the 
encoder has strictly causal access to the channel state sequence S n and wishes to communicate the 
state to the decoder. We assume a DMC with a DM state model (X x S,p(y\x, s)p(s),y) that consists 
of a finite input alphabet X, a finite output alphabet y, a finite state alphabet S, and a collection 
of conditional pmfs p(y\x,s) on y. The channel is memoryless in the sense that, without feedback, 
p(y n \x n ,s n ) = X\2=iPY\x,s{yi\ x ii s i)> an d the state is memoryless in the sense that the sequence 
(Si, S2, . . .) is independent and identically distributed (i.i.d.) with Si ~ ps(si). 
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Fig. 1. Strictly causal state communication. 

An (|iS| n ,n) code for strictly causal state communication over the DMC with DM state consists of 

• an encoder that assigns a symbol Xj(s 4_1 ) G X to each past state sequence s 4_1 G S^ 1 for 
i € [1 : ft], and 

• a decoder that assigns an estimate s n G S n to each received sequence y n G y n . 
The fidelity of the state estimate is measured by the expected distortion 

n 



n . 

i=i 

where d : S x S — ^[0,oo)isa distortion measure between a state symbol s G 5 and a reconstruction 
symbol s G 5. Without loss of generality, we assume that for every symbol s G S there exists a 
reconstruction symbol s G 5 such that d(s, s) = 0. 

A distortion D is said to be achievable if there exists a sequence of (|5| n ,n) codes such that 

limsupE(d(S n ,S n )) <D. 

n—too 

We next characterize the minimum distortion D*, which is the infimum of all achievable distortions D. 
Theorem 1: The minimum distortion for strictly causal state communication is 

D* = minE(d( < S, < S)), 

where the minimum is over all conditional pmfs p(x)p(u\x,s) and functions s(u,x,y) such that 

I(U,X;Y)>I(U,X;S). 

To illustrate this result, we consider the following. 

Example 1 (Quadratic Gaussian state communication): Consider the Gaussian channel with additive 
Gaussian state (5) 

Y = X + S + Z, 
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where the state S ~ N(0, Q) and the noise Z ~ N(0, N) are independent. Assume an expected average 
transmission power constraint 

where the expectation is over the random state sequence S n . We assume the squared error (quadratic) 
distortion measure d(s, s) = (s — s) 2 . 

We compare different transmission strategies for estimating the state at the decoder. In the classical 
communication paradigm, the encoder would its ignore knowledge of the channel state (since the strictly 
causal state information at the encoder does not increase the channel capacity) and transmit an agreed- 
upon training sequence to the decoder. The minimum distortion is achieved by estimating the state Si 
via minimum mean squared error (MMSE) estimation from the noisy observation Y{ = Xi + Si + Zi and 

D = Em-S t f) = E(S!)-^f = ^. 

Note that the result is independent of the particular sequence Xi, i.e., one could "send" JQ = 0, i € [1 : n]. 
This distortion is optimal when the encoder is oblivious of the state sequence as shown in (33. 

Alternatively, a block Markov coding scheme can be performed, in which the encoder communicates a 
description of the state sequence in the previous block using a capacity-achieving code. This strategy is 
similar to a source-channel separation scheme, whereby the state sequence is treated as a source and the 
compressed version of the source is sent across the noisy channel at a rate lower than the capacity. Since 
the distortion-rate function of the state is D(R) = Q2~ 2R (see, for example, [8]) and the capacity of 
the channel (with strictly causal state information at the encoder) is C = C(P/(Q + N)), the distortion 
achieved by this coding scheme is D = D(C) = Q(Q + N) / '(P + Q + N). It is straightforward to see 
that for the same values of P, Q and N, ignoring the state knowledge at the encoder can offer a lower 
distortion than using this (suboptimal) block Markov encoding scheme. 

The minimum distortion however can be achieved again by performing another block Markov coding 
scheme, but this time the encoder communicates a description of the state sequence in the previous 
block by incorporating side information (X, Y) about the state S of previous block at the decoder. This 
strategy is equivalent to setting X = all ~ N(0, P), U = S + S, where S ~ N(0, Q/P) is independent 
of (S,X), and S = E(S\U,X,Y) = E(S\S + S,S + Z) in TheoremQ] This strategy yields the minimum 
distortion given by D* = QN/(P + Q + N). (The proof of optimality is given in Section III.) This 
strategy, in effect, replaces D(R) = Q2~ 2R of the last scheme with the Wyner-Ziv distortion-rate 
function Dy/z(R) = (QN/(Q + N))2~ 2R (see [30]) and the minimum distortion D* can be evaluated 
by computing L>wz(C)- 
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In the following two subsections, we prove Theorem [T] 

A. Proof of Achievability 

We use b transmission blocks, each block consisting of n symbols. In block j, a description of the 
state sequence S n (j — 1) in block j — 1 is sent. 

Codebook generation. Fix a conditional pmf p(x)p(u\x, s) and function s(u, x, y) such that I{U, X; Y) > 
I{U, X; S), and let p{u\x) = ^2 s p{s)p(u\x, s). For each j G [1 : b], randomly and independently generate 
2 nR s sequences x n (lj-\), lj-\ G [1 : 2 nRs ], each according to Hi =l Px(xi). For each l^ x G [1 : 2 nRs ], 
randomly and conditionally independently generate 2 nRs sequences u n (kj\lj^i), kj G [1 : 2 nRs ], each 
according to ]Xi=iPu\x( u i\ x i(lj-i))- Partition the set of indices kj G [1 : 2 nRs ] into equal-size bins 
B(lj) = [(lj - l)2 n( - Rs ~ R ^ + 1 : Ij2 n( - Rs - Rs ">], lj G [1 : 2 nRs }. This defines the codebook 

Cj = {(x n (Z j _i), U n (fc j l/j-i): lj-i G [1:2^], % G [l:2 n «*]}, j G [1:6]. 

The codebook is revealed to both the encoder and the decoder. 

Encoding. By convention, let Iq = 1. At the end of block j, the encoder finds an index kj such that 

( S ™(i),n™(^-|/ i _ 1 ),x"(/ i _ 1 ))Grj n) . 

If there is more than one such index, it selects one of them uniformly at random. If there is no such 
index, it selects an index from [1 : 2 nRs ] uniformly at random. In block j + 1 the encoder transmits 
x n (lj), where lj is the bin index of kj. 

Decoding. Let e > e'. At the end of block j + 1, the decoder finds the unique index lj such that 
(x n (lj),y n (j + 1)) G Te • (If there is more than one such index, it selects one of them uniformly at 
random. If there is no such index, it selects an index from [1 : 2 nRs ] uniformly at random.) It then finds 
the unique index kj G B(lj) such that (u n (kj\lj_i),x n (lj-i),y n (j)) G % . Finally it computes the 
reconstruction sequence as §i(j) = s(ui(kj\lj-i),Xi(lj-i),yi(j)) for i G [1 : n]. 

Analysis of expected distortion. Let Lj_i,Kj,Lj be the indices chosen in block j. We bound the 
distortion averaged over the random choice of the codebooks Cj, j G [1:6]. Define the "error" event 

£(j) = {{S n ^{kj\Lj^X-{Lj^),Y n {j)) i 7>)} 
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and consider the events 

£,{j) = {{S n ,U^K j \L j _ 1 ),X n {L j . x ),Y n {j)) i r e W}, 
5 2 (j-l) = {4_i /%-!}, 

£ 2 (i) = (4- / %}, 

£ 3 (J) = {^7 / ^}- 
Then by the union of events bound, 

P{£(j)} < P{£i(i)} + P{£ 2 (j - i)} + P{^(i)} + P{f 2 c (j - i) n £ 2 c (j) n £ 3 (i)}- 

We bound each term. For the first term, let 

hiS) = {{SW^K^L^X^L^)) i T e ( , n) } 
and note that 

P{^(i)} < P{4(i)} + P{^i c (j) n s 1 (j)}. 

By the independence of the codebooks (in particular, the independence of Lj-i and Cj) and the covering 
lemma EU Sec. 3.7], P{£i(j)} tends to zero as n -)• oo if R s > I(U;S\X) + 5(e'). Since e > e' and 
3^)1 {^(^i-i = u n ,X n (L i _i) = x n ,S n (j) = s n } ~ rGLiPiwCVila*,*). b y the conditional 
typicality lemma ifTTl Sec. 2.5], P{^f(j) n£±(j)} tends to zero as n — > oo. 

Next, by the same independence of the codebooks and the packing lemma [1 1 , Sec. 3.2], P{£ 2 (j — 1)} 
and P{£ 2 (j)} tend to zero as n — > oo if Rs < I(X; Y) — 5(e). Finally, following the same steps as in 
the analysis of the Wyner-Ziv coding scheme IfTTl Sec. 11.3] (in particular, the analysis of £3), it can be 
readily shown that P{£$(j -l)n£$(j)n£ 3 (j)} tends to zero as n ->■ 00 if R s - R s < I{U;Y\X)-S(e). 
Combining the bounds and eliminating R$ and Rs, we have shown that P{£(j)} tends to zero as n — > 00 
if I(U,X;Y) > I{U;S\X) + 5{e') + 25(e) = I{U,X;S) + 5'{e), which is satisfied by our choice of 
p(x)p(u\x,s) for e sufficiently small. 

When there is no "error" (S n ,U n (K j \L J ^ 1 ),X n (L j ^ 1 ),Y n (j)) E Te {n) . Thus, by the law of total 
expectation and the typical average lemma IfTTl Sec. 2.4], the asymptotic distortion averaged over the 
random codebook, encoding, and decoding is upper bounded as 

Um S upE(d(S n (j),S n (j))) < limsup(d max P{£(j)} + (1 + e) E(d(S,S)) P{£ c (j)}) 

n— >oo n—¥oo 

<(l + e)E(d(S,S)), 

March 1, 2013 DRAFT 



where <i max = max, g\ e< $ X( s d(s, s) < oo. By taking e — > and 6 — >■ oo, any distortion larger than 
E(d(S, S)) is achievable for a fixed conditional pmf p(x)p(u\x,s) and function s(u,x,y) satisfying 
I(U,X;Y) > I(U,X;S). Finally, by the continuity of mutual information terms in p(x)p(u\x,s), the 
same conclusion holds when we relax the strict inequality to I(U,X;Y) > I(U,X;S). This completes 
the achievability proof of Theorem Q] 

B. Proof of the Converse 

In this section, we prove that for every code, the achieved distortion is lower bounded as D > D*. 
Given an (|<S| n ,n) code, we identify the auxiliary random variables Ui = (S 1 ' 1 ^^), i G [1 : n]. Note 
that, as desired, Ui — > (JQ, Si) — > Yi form a Markov chain for j£ [1 : n]. Consider 

n n 

^I(U u X i ;S l ) = J2l(S l - 1 ,Yr +1 ,X l ;S l ) 



i=l i=l 

n 



s=l 

n 



= J2{i(s i - 1 -,s i ) + i07li;Si\s i - 1 )) 

i=i 

n 

i=l 
n 

i=i 

n 

i=i 

n 

= J2 I (Ui,X i ;Y i ), (1) 



i=i 



where (a) follows since Xj is a function of S* 1 , (6) follows since S n is i.i.d., and (c) follows by the 
Csiszar sum identity 0, US, CU Sec. 2.3]. 

Let Q be a time-sharing random variable, uniformly distributed over [1 : n] and independent of 
(X n ,S n ,Y n ), and let U = (Q,Uq), X = Xq, S = Sq, and Y = Yq. It can be easily verified 
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that X is independent of S and U — > (X, S) -» 1" form a Markov chain. Furthermore 

I{U,X;S)®I(Vq,Xq;Sq\Q) 



1 n 
-y)/(0i,Ji;5i) 



n 

i=l 



(b) 1 ™ 

<-J2l(Ui,Xi;Yi) 

n *■ — ^ 



n 

4=1 



= /(C7 Q ,X ;r Q |Q) 

where (a) follows since Q is independent of Sq and (b) follows from the definition of the code. 

To lower bound the expected distortion of the given code, we rely on the following result. 

Lemma 1: Suppose Z — ^ V — > W form a Markov chain and d(z, z) is a distortion measure. Then for 
every reconstruction function z(v,w), there exists a reconstruction function z*(v) such that 

E[d(Z,z*(V))] <E[d(Z,z(V,W))]. 

This extremely useful lemma traces back to Blackwell's notion of channel ordering 0]], G51 and can 
be interpreted as a "data processing inequality" for estimation. In the context of network information 
theory, it has been utilized by Kaspi ITT31 (see also fTTl Section 20.3.3]) and appeared in the above simple 
form in (3j . For completeness, the proof of this lemma is provided in Appendix lAl 

Now consider 



1 n 



n 



Wl" 

>-J2 min EldiSitf&UuXiM))] 

= min E[d(S,s*(U,X,Y))], 

s"(u,x,y) 

where (a) follows from LemmaEby identifying Si as Z, (U i} Xi,Yi) = (S i - 1 ,X i ,Y- a ) as V, and Y^ 1 
as W, and noting that Si — > (S l ~ 1 ,Xi, Y™) — > Y l ~ l form a Markov chain. This completes the proof of 
Theorem [Q 

C. Lossless Communication 

Suppose that the state sequence needs to be communicated losslessly, i.e., linin^oo PIS'™ ^ S n } = 0. 
We can establish the following congruence of Theorem Q] 
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Corollary 1: If H(S) < A* = max p ( x ) I(X, S; Y), then the state sequence can be communicated 
losslessly. Conversely, if the state sequence can be communicated losslessly, then H(S) < A*. 
To prove this, consider the special case of S = S and Hamming distortion measure d(s, s) (i.e., d(s, s) = 
if s = s and 1 if s ^ s). By setting U = S in the achievability proof of Theorem Q] in Subsection III-AI 
and noting that no "error" implies that S n = S n , we can conclude that the state sequence can be 
communicated losslessly if A* > H(S) for some p(x). The converse follows immediately since the 
lossless condition that the block error probability P{S n ^ S n } tends to zero as n — > oo implies the zero 
Hamming distortion condition that the average symbol error probability (1/n) Y^i=i ^{^i ¥" &i} tends to 
zero as n — > oo. Combining this observation with the converse proof of Theorem Q] in Subsection III-B I 
we can conclude that H(S) must be less than or equal to A*. 

Remark 1: If we define A* = maXpM I (X,S;Y), then mm{H(S), A*} characterizes the state 
uncertainty reduction rate, which captures the performance of the optimal list decoder for the state 
sequence (see [17] for the exact definition). The proof of this result again follows from Theorem [T] by 
letting S be the set of pmfs on S and d(s,s) = log(l/s(s)) be the logarithmic distortion measure and 
adapting the technique by Courtade and Weissman 0. 

III. Capacity-Distortion Tradeoff 

Now suppose that in addition to the state sequence S n , the encoder wishes to communicate a message 
M independent of S n . What is the optimal tradeoff between the rate R of the message and the distortion 
D of state estimation? 

A (2 nR , n) code for strictly causal state communication consists of 

• a message set [1 : 2 nR ], 

• an encoder that assigns a symbol Xi(m, s l ~ l ) G X to each message m € [1 : 2 nR ] and past state 
sequence s 1 ^ 1 G S t ~ 1 for i £ [1 :n], and 

• a decoder that assigns a message estimate rh G [1 : 2 nR ] (or an error message e) and a state sequence 
estimate s n G S n to each received sequence y n G y n . 

We assume that M is uniformly distributed over the message set. The average probability of error is 
defined as P e = P{M / M}. As before, the channel state estimation error is defined as E(d(S n , S n )). 
A rate-distortion pair is said to be achievable if there exists a sequence of (2 nR ,n) codes such that 
lim n _). 00 Pe = and lim sup n _ ) . 00 E(d(S n , S n )) < D. The capacity-distortion function Csc(-C) is the 
supremum of the rates R such that (R, D) is achievable. 
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We characterize this optimal tradeoff between information transmission rate (capacity C) and state 
estimation (distortion D) as follows. 

Theorem 2: The capacity-distortion function for strictly causal state communication is 

C SC (D) = max(/([/, X; Y) - I(U, X; S)) , (2) 

where the maximum is over all conditional pmfs p(x)p(u\x, s) with \U\ < \S\ + 2 and functions s(u, x, y) 
such that E(d(S,S)) < D. 

The proof of Theorem [2] is similar to the zero-rate case in Theorem Q] and thus we delegate it to 
Appendix |B] Note that the inverse of the capacity-distortion function, namely, the distortion-capacity 
function for strictly causal state communication is 

D SG (C) = mmE(d(S,S)), (3) 

where the minimum is over all conditional pmfs p(x)p(u\x, s) and functions s(u, x, y) such that I(U, X; Y)- 
I(U, X; S) > C. By setting C = in (O, we recover Theorem [Q (More interestingly, we can recover 
Theorem |2] from Theorem \T\ by considering a supersource S' = (S, W), where the message source W is 
independent of S, and two distortion measures — the Hamming distortion measure d(w,w) and a generic 
distortion measure d(s,s).) At the other extreme, by setting D = oo in (f2]), we recover the capacity 
expression 

C = max I(X;Y) (4) 

p(x) 

of a DMC with DM state when the state information is available strictly causally at the encoder. (Unlike 
the general tradeoff in Theorem |2] strictly causal state information is useless when communicating the 
message alone.) Finally, by setting U = in Theorem |2j we recover the result in 0TI on the capacity- 
distortion function when the state information is not available at the encoder. 

Remark 2: Theorem [2] (as well as Theorem [T} holds for any finite delay, that is, whenever the encoder 
is defined as Xi(m, s l ~ d ) for some d G [1 : oo). More generally, it continues to hold as long as the delay 
is sublinear in the block length n. 

Remark 3: The characterization of the capacity-distortion function in Theorem [2l albeit very compact, 
does not bring out the intrinsic tension between state estimation and independent information transmission. 
It can be alternatively written as 

C S c(D) = max (l(X;Y) - E X [R™(D X )]), (5) 

p(x),D x :E x (D x )<D 
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where 

p(u\x,s),s(u,x,y):E[d{S,S(U,x,Y))]<D 



R ( $ Z {D) = min a I(U;S\x,Y), x G X, 



is the Wyner-Ziv rate-distortion function with side information (x, 1"). The rate R^ Z (D X ) can be viewed 
as the price the encoder pays to estimate the channel state at the decoder under distortion D x by signaling 
with x. In particular, if Ryj z (D) is independent of x for a fixed D (i.e., Ry/ Z (D) = i?\yz(-D)X then by 
the convexity of the Wyner-Ziv rate-distortion function, the alternative characterization of Csc(D) in 
(f5]) simplifies to 

C sc (D) = C sc (oo) - Rwz (D) , (6) 

where 

Rwz(D) = R$ Z (D), xeX. 

Thus, in this case the capacity is achieved by splitting the unconstrained capacity Csc(oo) into information 
transmission and lossy source coding of the past state sequence with side information (X, Y). This simple 
characterization will be very useful in evaluating the capacity-distortion function in several examples. 

Remark 4: Along the same lines of |17], the optimal tradeoff between the state uncertainty reduction 
rate A and independent information transmission rate R can be characterized as the set of (R, A) pairs 
such that 

R<I(X;Y) 
A < H(S) 
R + A<I(X,Y;S) 

for some p(x). This result includes both the state uncertainty reduction rate in Remark [TJ and the channel 
capacity in (01) as special cases. 

In the following subsections, we illustrate Theorem |2] via simple examples. 

A. Injective Deterministic Channels 

Suppose that the channel output 

Y = y(X,S) 

is a function of X and S such that given every x G X, the function y(x, s) is injective (one-to-one) in 
s. This condition implies that H(Y\X) = H(S) for every p(x). For this class of injective deterministic 
channels, the characterization of the capacity-distortion function in Theorem [2] can be greatly simplified. 
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Proposition 1: The capacity-distortion function of the injective deterministic channel is 

Csc(D) = Csc(O) = max/(X; Y) = max(tf(Y) - H(S)). (7) 

p(x) p(x) 

In other words, we can achieve the unconstrained channel capacity as well as perfect state estimation. 
This is no surprise since the injective condition implies that given the channel input X and output Y, the 
state S can be recovered losslessly. Note that this result is independent of the distortion measure d(s, s) 
as long as our critical assumption — for every s, there exists an s with d(s, s) = — is satisfied. 

To prove achievability in Proposition [Q substitute U = Y in Theorem [2] For the converse, consider 

I(U,X;Y)-I(U,X;S) = I(X;Y) - (l(U; S\X) - I(U;Y\X)) 

= I(X;Y) - (H(U\Y,X)-H(U\X,S)) 
( = } I(X;Y) - (H(U\Y,X) - H(U\Y,X,S)) 
= I(X;Y)-I(U;S\Y,X) 

^I(X;Y), 

where (a) follows since Y = y(X, S) and (6) follows from the injective condition. 

Example 2 (Gaussian channel with additive Gaussian state and no noise): Consider the channel 

Y = X + S, 

where the state S ~ N(0,Q). Assume the squared error distortion measure and an expected average 
power constraint P on X. The capacity-distortion function of this channel is 

C SC (D) = C(P/Q) for all D, 

which is the capacity without state estimation. 

Example 3 (Binary symmetric channel with additive Bernoulli state and no noise): Consider the chan- 
nel 

Y = X®S, 

where X and Y are binary and the state S ~ Bern(g). Assume the Hamming distortion measure. The 
capacity-distortion function of this channel is 

C sc (D) = l-H{q) for all D. 

In the following subsections, we extend the above two examples to the more general cases where there 
is additive noise. 
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B. Gaussian Channel with Additive Gaussian State 

We revisit the Gaussian channel with additive Gaussian noise (see Example [Q) 

Y = X + S + Z, 

where S ~ N(0, Q) and Z ~ N(0, N). As before, we assume an average expected power constraint P 
and the squared error distortion measure d(x,x) = (x — x) 2 . 

We note the following extreme cases of the capacity-distortion function: 

• If N = 0, then C SC (D) = C sc (oo) = oo. 

• If D < D* = QN/(P+Q + N) (the optimal distortion mentioned in Example Q), then C S c(D) = 0. 

• If D > QN/(Q + N) (the minimum distortion achievable when the encoder has no knowledge of 
the state), then C(D) = C(oo) = C(P/(Q + N)), which is achieved by first decoding the codeword 
X n in a "noncoherent" fashion, then utilizing X n along with the channel output Y n to estimate S n 
(see EH). 

More generally, we have the following. 

Proposition 2: The capacity-distortion function of the Gaussian channel with additive Gaussian state 
when the state information is strictly causally available at the encoder is 

QN 



Csc(D) 



0, < D < 



P+Q+Af' 



p / (P+Q+N)D-QN \ QN < n QN 

° l QN j i P+Q+N — u ^ Q+iV' 



°G 



D > 



QN 



^Q+NJ ' ^ - Q+N- 

Proposition |2]can be proved by evaluating the characterization in Theorem |2] with the optimal choice of 
the auxiliary random variable U and the estimation function s(u, x, y). However, the alternative character- 
ization in Remark [3] provides a more direct proof. Since the Wyner-Ziv rate-distortion function [30 1 for 
the Gaussian source S with side information Y = x + S + Z is independent of x, it follows immediately 
from © that Csc(D) = Csc(co) — Rwz(D), which is equivalent to the expression given in Proposition [2] 

C. Binary Symmetric Channel with Additive Bernoulli State 
Consider the binary symmetric channel 

Y = X®S®Z, 

where the state S ~ Bern(g), q G [0, 1/2], and the noise Z ~ Bern(p), p G [0, 1/2], are independent of 
each other. Assume the Hamming distortion measure d(x, x) = x © x. 
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We note the following extreme cases of the capacity-distortion function: 

• If p = 0, then D* = and C SC {D) = 1 - H(q). 

• If q = 0, then £>* = and C SC {D) = 1 - H(p). 

• If p = 1/2, then L>* = 9 and C sc (£>) = 0. 

• If q = 1/2, then D* = p and C sc (£>) = 0. 

. If D > q, then C SC (D) = C sc (oo) = 1-H(p*q) = l- H(p(l - q) + q(l-p)). 
More generally, we have the following. 

Proposition 3: The capacity-distortion function of the binary symmetric channel with additive Bernoulli 
state when the state information is strictly causally available at the encoder is 

C SC (D)= max [l - H(p) - a(H(f3 * q) - H(f3))] 

a,/3e[0,l}:al3+{l- a )q<D l V n 

= 1-H(p*q)-R WZ (D), 

where 

Rwz(D)= min [H(p)- H(p*q) + a(H(fi*q) - H(J3))] (8) 

a,Pe[0,l]:a/3+(l-a)q<D 

is the Wyner-Ziv rate-distortion function for the Bernoulli source and Hamming distortion measure. 

As in the Gaussian case, the proof of the proposition follows immediately from the alternative charac- 
terization of the capacity-distortion function in Remark [3] Here the Wyner-Ziv rate-distortion function 
follows again from ||30l . 

IV. Causal State Communication 

So far in our discussion, we have assumed that the encoder has strictly causal knowledge of the state 
sequence. What will happen if the encoder has causal knowledge of the state sequence, that is, at time 
i € [1 : n] the previous and current state sequence s l is available at the encoder? Now a (2 nR , n) code, 
probability of error, achievability, and capacity-distortion function are defined as in the strictly causal 
case in Section [TTTl except that the encoder is of the form Xi(m, s 1 ), i G [1 : n]. 

It turns out that the optimal tradeoff between capacity and distortion can be achieved by a simple 
modification to the block Markov coding scheme for the strictly causal case. 

Theorem 3: The capacity-distortion function for causal state communication is 

C C (D) = mzx(l(U,V;Y) - I(U,V;S)), (9) 

where the maximum is over all conditional pmfs p(v)p(u\v, s) with |V| < min{(|A?| — 1)|«S| + 1, |3^|} + 1 
and \U\ < \S\ + 2 and functions x(v,s) and s{u,v,y) such that E(d(S, S)) < D. 
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At one extreme point, if D = oo, then the theorem recovers the unconstrained channel capacity 
C c (oo)= max (l(U, V; Y) - I(U, V; S)) = max I(V;Y) 

p(v)p(u\v,s),x(v,s) p(v),x(v,s) 



established by Shannon 1261 . At the other extreme point, the optimal distortion for causal state commu- 
nication is 

D* = mmE(d(S,S)), 

where the minimum is over all conditional pmfs p(v)p(u\v,s) and functions x(v,s) and s(u,v,y) such 
that 

I(U,V;Y)>I(U,V;S). 

Moreover, the condition for zero Hamming distortion can be shown to be 

max I(X,S;Y) > H(S), 

p{x\s) 

which was proved in 1171 . Note that by setting V = X in the theorem, we recover the capacity-distortion 
function Csc(D) for strictly causal communication in Theorem |2] 

To prove achievability for Theorem [3] we use the Shannon strategy |[26ll (see also Q~L Sec. 7.5]) and 
perform encoding over the set of all functions {x v (s): S (->■ X} indexed by v as the input alphabet. 
This induces a DMC with DM state p(y\v, s)p(s) = p(y\x(v, s), s)p(s) with the state information strictly 
causally available at the encoder and we can immediately apply Theorem [2] to prove achievability of 
Cq{D). For the converse, we identify the auxiliary random variables V{ = (M, S 1 " 1 ) and Ui = iS-i' 
iG [l:n]. Note that (Ui,Vi) —> (X{, Si) —> Y{ form a Markov chain, V\ is independent of Si, and Xi is 
a function of (Vi, Si) as desired. The rest of the proof utilizes Lemma [Hand the concavity of Cq(D), 
and follows similar steps to that for the strictly causal case in Appendix [B] 

In the following subsections, we illustrate Theorem [3] through simple examples. 

A. Gaussian Channel with Additive Gaussian State 

We revisit the Gaussian channel (see Example Q] and Subsection IIII-Bb 

Y = X + S + Z. 

While the complete characterization of Cq{D) is not known even for the unconstrained case (D = oo), 
the optimal distortion can be characterized as 

QN 



D" 



(Vp + VQ) 2 + n' 



March 1, 2013 DRAFT 



17 



Achievability follows by setting U = V = 0, X = y/P/Q S, and S = E(S\Y). The converse follows 
from the fact that D* is also the optimal distortion when the state information is known noncausally at 
the encoder (see [29|). It is evident that knowing channel state causally helps the encoder to coherently 
choose the channel codeword X to amplify the channel state S unlike the strictly causal case where X 
and S are independent of each other. 

B. Binary Symmetric Channel with Additive Bernoulli State 
We revisit the binary symmetric channel (see Subsection IIII-CI ) 

Y = X®S®Z, 

where S ~ Bern(g) and Z ~ Bern(p) are independent of each other. 
We note the following extreme cases of the capacity-distortion function: 

• If p = 0, then D* = and C C (D) = 1 - H(q). 

• If q = 0, then D* = and C C {D) = 1 - H(p). 

• If p = 1/2, then D* = q and C G {D) = 0. 

• If D > q, then Cc{D) = Cc(oo) = 1 — H(p), which is achieved by canceling the state at the 
encoder (X = V ® S). 

In general, the capacity-distortion function is given by the following proposition. 
Proposition 4: The capacity-distortion function of the binary symmetric channel with additive Bernoulli 
state when the state information is causally available at the encoder is 

C c (D) = l-H(p)-H(q)+H(D), D < q. 

Proof: For the proof of achievability, observe that if we cancel the state at the encoder and split the 
unconstrained capacity into information transmission and lossy source coding of the past state sequence 
(without side information since V and Y are independent of S), then Cc(oo) — R(D) = (1 — H(p)) — 
{H{q)-H{D)) is achievable. This corresponds to evaluating Theorem |2] with X = V®S, U = V®S®S, 
and S = U V = S S, where V ~ Bem(l/2) and S ~ Bern(D) are independent of S. (Note the 
similarity to rate splitting for the strictly causal case discussed in Remark [3]) 
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For the proof of the converse, consider 

I(U, V; Y) - I(U, V; S) = I(U, V, S; Y) - I(U, V, Y; S) 

= H{Y) - H(Y\U, V, S) - H(S) + H(S\U, V, Y) 
( = } H(Y) - H{Y\X, S) - H{S) + H(S\U, V, Y) 
® H(Y) - H(Y\X, S) - H(S) + H(S S\U, V, Y) 
<1-H(p)-H(q) + H(S(BS) 
= 1-H(p)-H(q) + H(D), 

where (a) follows since X is a function of (V, S) and (U, V) — > (X, S) — > Y form a Markov chain, and 
(6) follows since S is a function of (U, V, Y). This completes the proof of the proposition. ■ 

C. Five-Card Trick 

We next consider the classical five-card trick. Two information theorists, Alice and Bob, perform a 
"magic" trick with a shuffled deck of N cards, numbered from to N — 1. Alice asks a member of the 
audience to select K cards at random from the deck. The audience member passes the K cards to Alice, 
who examines them and hands one back. Alice then arranges the remaining K — 1 cards in some order 
and places them face down in a neat pile. Bob, who has not witnessed these proceedings, then enters the 
room, looks at the K — 1 cards, and determines the missing i^-th card, held by the audience member. 
There are two key questions: 

• Given K, find the maximum number of cards N for which this trick could be performed? 

• How is this trick performed? 

This trick (discussed in Q, ||23l ) can be formulated as state communication at zero Hamming distortion 
with causal state knowledge at the encoder. 

Proposition 5: The maximum number of cards N for which the trick could be performed is K\ +K— 1. 
Proof: To show that the maximum cannot be larger than K\ + K — 1, that is, to prove the converse, 
we suppose that multiple rounds of the trick were to be performed. In the framework of causal state 
communication, the state S corresponds to an unordered tuple of K cards selected by the audience 
member, which is uniformly distributed over all possible choices of K cards. The channel input X (as 
well as the channel output Y) corresponds to the ordered tuple of K — 1 cards placed and received, 
respectively, by Alice and Bob. Since Bob has to recover the missing card losslessly, the problem is 
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equivalent to reproducing the state S itself with zero Hamming distortion (by combining the remaining 
card with the received K — 1 cards). 

Now by Theorem [3] the necessary condition for zero Hamming distortion is given by 



or equivalently, 



maxH(X)-H(S) > 0, 

p(x\s) 



max(H(X\S) - H(S\X)) > 0. (10) 

p(x\s) 



Since S is uniform and the maximum is attained by the (conditionally) uniform X, the condition in (fTOl 
simplifies to 

log(K\)>log(N-(K-l)), 

or equivalently, 

N < K\ + K - 1. 

We now show that we only need one round of communication to achieve this upper bound on causal 
state communication. Without loss of generality, assume that the selected cards (co, • • • , ck-i) are ordered 
with Co < ci < • • • < Ck-x- Alice selects card c, to hand back to the audience where i = cq + c\ + • • • + 
ck-i (mod K). Observe that 

co + ciH h c K -i = Kt\ + i, (11) 

for some integer r±. The remaining K— 1 cards (cj 1 , • • • , Cj K _ 1 ) (cj = Ci is the deleted card) are summed 
and decomposed, i.e., 

c ii + c h H H c iiC _ 1 = ETr 2 + s, (12) 

for some integer r-i- Since all the K cards sum to i (mod K), the missing card Cj = Cj must be 
congruent to — s + i (mod K). Thus 

c io = Cj = -^(n - r 2 ) - s + i (13) 

Therefore, if we renumber the N — (K — 1) cards from to K\ — 1 (by removing the K — 1 retained 
cards), the hidden card's new number is congruent to — s (mod K) as the hidden card's new number 
ci — i is equal to K (n — r 2 ) — s. But there are exactly (K — 1)! possibilities remaining for the hidden 
card's number, which can be conveyed by a predetermined permutation of the K — 1 retained cards. This 
completes the achievability proof. ■ 
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V. Concluding Remarks 

The problem of joint information transmission and channel state estimation over a DMC with DM 
state was studied in IJ3TI (no state information at the encoder) and [28], [29 1 (full state information at 
the encoder). In this paper, we bridged the temporal gap between these two results by studying the case 
in which the encoder has strictly causal or causal knowledge of the channel state information. 

The resulting capacity-distortion function permits a systematic investigation of the tradeoff between 
information transmission and state estimation. We showed the use of block Markov coding coupled with 
channel state estimation by treating the decoded message and received channel output as side information 
at the decoder is optimal for communicating the state. Additional information transmission requires a 
simple rate-splitting strategy. We also showed that the capacity-distortion function when the encoder is 
oblivious of the state information (see [ 3 1 1 ) can be recovered from our result. 

Finally, we recall an important open problem of finding the capacity-distortion function Cnc(-D) for a 
general DMC with DM state with an arbitrary distortion measure, when the state sequence is noncausally 
at the encoder. The problem was studied in ll28l . which established a lower bound on Cnc(-D) a $ 

Cnc(£>) > max(/(f/; Y) - I(U; S)) , (14) 

where the maximum is over all conditional pmfs p(u\s) and functions x(u, s) and s(u,y) such that 
E(d(S, S)) < D. While it is believed that this lower bound is tight in general (see, for example, ||29l 
for the case of Gaussian channels with additive Gaussian states with quadratic distortion measure), the 
proof of the converse seems beyond our current techniques of identifying auxiliary random variables and 
using estimation-theoretic inequalities such as Lemma [1] 
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Appendix A 
Proof of Lemma [1] 

Using the law of iterated expectations, we have 

E[d(Z,z(V,W))] = E v [E[d(Z,z(V,W))\V]]. (15) 
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Now, for each v € V, 

E[d(Z,z(V,W))\V = v] = J2 P{z\v)p(w\v)d(z,z{v,w)) 

= / ] p(w\v) y ^p(z\v)d(z,z(v,w)) 
wew zez 

> min N p(z\v)d(z, z(v, w)) (16) 

mew / -^ 

z£Z 

where w*(v) attains the minimum in (fT6l ) for a given v. Define z*(v) = z(w*(v)). Then (fT5|) becomes 

E[d(Z,*(V,W0)] = E v [E[d(Z,5(V,W))|V]] 



>E, 



^p(z|t;)(i(2;,i*(v)) 



,z£Z 

= E[d(Z,z*(V))] 

which completes the proof. 

Appendix B 
Proof of Theorem [2] 

Before proving the Theorem |2] we summarize a few useful properties of Csc(-D) in Lemma [2l In 
EH, they also discussed similar properties of the capacity-distortion function for the case in which the 
channel state information is not available. 

Lemma 2: The capacity-distortion function Csc{D) in Theorem [2] has the following properties: 

(1) Csc(D) is a nondecreasing concave function of D for all D > D*. 

(2) Csc(D) is a continuous function of D for all D > D*. 

(3) C SC {D*) = if D* / and C sc (£>*) > if D* = 0. 

The monotonicity is trivial. The concavity can be shown by using the standard time sharing argument. 
The continuity is a direct consequence of the concavity. The last property follows from Section IV. With 
these properties in hand, let us prove Theorem |2] 
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A. Proof of Achievability 

We use b transmission blocks, each consisting of n symbols. The encoder uses rate-splitting technique, 
whereby in block j, it appropriately allocates its rate between transmitting independent information and 
a description of the state sequence S n (j — 1) in block j — 1. 

Codebook generation. Fix a conditional pmf p(x)p(u\x, s) and function s(u, x, y) that attain Csc(D/(l+ 
e)), where D is the desired distortion, and let p(u\x) = Y1sP( s )p( u \ x ' s )- For each j £ [1 : b], randomly 
and independently generate 2 n ( R+Rs ' sequences x n (mj,lj-i), rrij G [1 : 2 nR ],lj_i G [1 : 2 nRs ], 
each according to W^ = xPx{xi)- For each rrij G [1 : 2 nR ],lj-i G [1 : 2 nRs ], randomly and condi- 
tionally independently generate 2 nRs sequences u n (kj\mj,lj-i), kj G [1 : 2 nRs ], each according to 
Y[7=iPu\x( u i\ x i{ m j> lj-i))- Partition the set of indices kj G [1 : 2 nRs ] into equal-size bins B(lj) = 
[(lj - l)2 n ( Rs - Rs *> + 1 : lj2 n ( Rs - Rs \ lj G [1 : 2 nRs }. This defines the codebook 

Cj = {(x n (m j ,l j - 1 ),u n (k j \m j ,l j - 1 ): mj G [1 : 2^],/,_ 1 G [1 : 2 nRs ], kj G [1 : 2"^]}, j G [1 : b]. 

The codebook is revealed to the both encoder and the decoder. 

Encoding. By convention, let Iq = 1. At the end of block j, the encoder finds an index kj such that 

(s n (j), U n (kj\mj,lj- 1 ),x n (m j ,lj- 1 )) G jf> '. 

If there is more than one such index, it selects one of them uniformly at random. If there is no such 
index, it selects an index from [1 : 2 nRs ] uniformly at random. In block j + 1 the encoder transmits 
x n {nij + i, lj), where rrij + i is the new message index to be sent in block j + 1 and lj is the bin index of 
kj. 

Decoding. Let e > e'. At the end of block j + 1, the decoder finds the unique index rhj+i, lj such that 
(x n (rhj + i,lj),y n (j + 1)) G T e . The decoder thus decodes the message index rhj+i in block j + 1. It 
then finds the unique index kj G B(lj) such that (u n (kj\rhj,lj-i),x n (mj,lj^i),y n (j)) G % . Finally 
it computes the reconstruction sequence as 8i(j) = a(v^(kj\mj,lj-i),Xi(rhj,lj-.i),yi{j)) for i G [1 :n\. 
Following the analysis of minimum distortion in Section II, it can be readily shown that the scheme 
can achieve any rate up to the capacity-distortion function given in Theorem |2] 

B. Proof of the Converse 

We need to show that given any sequence of (2 nR , n) code with lim^^oo P e = and E(d(S n , S n )) < 
D, we must have R < Csc(D). We identify the auxiliary random variables Ui := (M, 5 i_1 , Y^), 
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i 6 [1 : n] with So = Y n+ \ = 0. Note that, as desired, U-i -» (Xi,Si) —> Yi form a Markov chain for 
i € [1 : n]. Consider 

nR = H(M) 

(a) 

<I{M;Y n )+ne n 
n 

= Y J I{M;Y i \Y l n +1 ) + ne n 

n 

n n 

= Y J I{M,Y l n +1 ,S^ 1 ;Y t )-Y J I{S i - l ;Y t \M,Y l n +1 ) + ne n 



ip) 



(c) 



1=1 1=1 

n n 



J2l(M,Y^. 1 ,S i -\X i ;Yi)-J2l(S i - 1 ;Y i \M,Y^ l )+ne n 



i=i i=i 

n n 



Y J l{M,Y^S i -\X i -Y i )-Y,I{Y l n + l\S i \M 1 S i - l ) + ne n 



ip) 



i=l i=\ 

n n 



Y J I(M,Y, n +l iS*-\X i] Y i )-Y J I{Yt + i,Si\M,S*-\X l )+ne n 



i=l i=\ 

n n 



( = ) ^I(M,^ 1) 5 i - 1 ,X i ;y i )-E J ( M ' 5 ' i " 1 '^'^+i;^) +ne « 

i=l i=l 

n n 

= Y / HU i ,X i ;Y i )-Y / I(Ui,X i ;S l )+ne n (17) 

i=i i=i 

where (a) follows by Fano's inequality ||8] Theorem 7.7.1], which states that H(M\Y n ) < ne n for 
some e n — > as n — > oo for any code satisfying lim n _ i . 00 P e = 0, (6) follows since X{ is a function 
of (M,5 i_1 ), (c) follows by the Csiszar sum identity g), (13, El Sec. 2.3], and (d) follows since 
(M, iS^ -1 ,Xj) is independent of Si. So now we have 



-. n n 



n 

i=i j=i 



(«)1 " 

< - V C sc (E(d(S 4 , %(^, X 4 , Y, )))) + ne n 

r> *■ — * 



n . 
t=i 



(b) 1 ™ 



< Csc(£»), (18) 
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where (a) follows from the definition of the capacity-distortion function, (b) follows by the concavity of 
Csc(-C) (see Property 1 in Lemma |2), and (c) follows from Lemmas [Q and |2] This completes the proof 
of Theorem |2] 
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